PATENT 
MS No. 306980.01 
L&H Docket No. MCS-069-03 

5 

A SYSTEM AND METHOD FOR VISUAL ECHO CANCELLATION IN 
A PROJECTOR-CAMERA-WHITEBOARD SYSTEM 

10 BACKGROUND 
Technical Field: 

The invention is related to a system that incorporates a whiteboard into a 
15 projector-camera system. More particularly, this invention is related to a system 
and method for transmitting a clear image of a whiteboard work surface for 
remote collaboration by employing a visual echo cancellation technique. 

Related Art: 

20 

During the past few years one has witnessed the transformation of video 
cameras and projectors from expensive lab equipments to affordable consumer 
products. This has triggered the creation of many human-computer interaction 
systems that incorporate both the large-scale display provided by the projector 
25 and intelligent feedback from one or more cameras. On the other hand, the 
whiteboard is still an indispensable part of many meetings (including lecturing, 
presentations and brainstorming), because it provides a large shared space for 
meeting participants to focus their attention and express and exchange their 
ideas spontaneously. 

30 

Previous works have integrated whiteboards within a projector-camera 
system. However, these previous works mostly focused on group collaboration 
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at the same physical location. The use of a whiteboard where there are remote 
participants, on the other hand, poses a variety of problems. For example, 
remote meeting participants cannot contribute to the content written on a 
whiteboard because they are not physically in the same place as the whiteboard. 
5 Additionally, sending images of the whiteboard to remote participants via a 
network use a great amount of network bandwidth and is often quite slow. 

Therefore, what is needed is a system and method for capturing and 
transmitting the contents of writing on a whiteboard that can be transferred to 
10 remote meeting participants via a network in a legible manner without requiring a 
large bandwidth. The system should also allow remote participants to contribute 
to the whiteboard content and allow for the whiteboard content to be archived for 
viewing at a later date. 

15 

SUMMARY 

The system and method of the present invention integrates a whiteboard 
into a projector-camera system by using the whiteboard both as a writing surface 
20 and a projecting surface. The invention captures an image sequence or video of 
the whiteboard with a camera and separates the writing on the whiteboard from 
the contents projected on it. 

By adding the extracted writings digitally on the original computer 
25 presentation for display at remote sites, remote attendees can have a better 
viewing experience than just viewing the transmitted display content where the 
writing on the whiteboard is not available. They also have a better viewing 
experience than just viewing the raw whiteboard video in which the quality of the 
original display content is poor. Furthermore, if remote attendees send an 
30 annotation to the local meeting room, the annotation will be projected, but it will 
not be sent back to the remote site. By analogy with echo cancellation in audio 
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conferencing, this is called visual echo cancellation. Visual echo, by strict 
definition, is the appearance of the projected annotation (from remote attendees) 
viewed by the camera. However, since the computer presentation is sent to 
remote attendees for better viewing quality, one must also consider the 
5 appearance of the projected presentation as visual echo. Visual echo 

cancellation is then solved if one can extract only the physical writings from the 
video or sequence of images of the items projected on the whiteboard. For this, 
one needs an accurate prediction of the appearance of the computer-projected 
content viewed by the camera. This, in turn, requires two basic components: 
10 geometric calibration and color calibration. Geometric calibration concerns the 
mapping between the position in the camera view and the position in the 
projector screen, namely the whiteboard in this case. Color calibration concerns 
the mapping between the actual color of the projected content and that seen by 
the camera. 

15 

A typical process by which the system and method according to the 
invention can be employed is as follows. Initially, geometric calibration data and 
color calibration data are obtained. Images or video of the whiteboard having 
projected items thereon, such as for example, a projected presentation or 

20 annotations made by remote participants, as well as writings written on the 
physical whiteboard are captured. Then, the visual echo for a given captured 
image is computed. Visual echo cancellation is then used to isolate the writings 
written on the whiteboard from the projected content (e.g., the remote 
participant's annotations and the electronic presentation). The writings may be 

25 sent to the remote participants to be displayed at a remote display in conjunction 
with the transmitted presentation and the annotations made by the remote 
participants. Similarly, the writings may be archived for future viewing. 



Several immediate advantages of the system and method according to the 
30 invention are that computer presentations and whiteboard discussions are 

seamlessly integrated into one session. Meeting attendees are not distracted by 
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switching their attention from a projector screen to the whiteboard, and vice versa. 
Furthermore, such a system enables local and remote attendees to collaborate 
with each other on a single mutually shared workspace. Local attendees have a 
much more natural writing surface than if a commercial large display product, 
5 such as an electronic whiteboard, is used. Most importantly, the system can be 
easily deployed on top of current meeting environments. The system and 
method of the present invention is therefore much more economical than most 
large display products that require installing specialized and usually expensive 
equipments and accessories. 

10 

In addition to the just described benefits, other advantages of the present 
invention will become apparent from the detailed description which follows 
hereinafter when taken in conjunction with the accompanying drawing figures. 

15 

DESCRIPTION OF THE DRAWINGS 

The specific features, aspects, and advantages of the present invention 
will become better understood with regard to the following description, appended 
20 claims, and accompanying drawings where: 

FIG. 1 is a general system diagram depicting a general-purpose 
computing device constituting an exemplary system for implementing the present 
invention. 

25 

FIG. 2 illustrates a projector-camera-whiteboard system according to the 
present invention. 

FIG. 3 illustrates an exemplary flow diagram for the general process of 
30 separating projected content and whiteboard writings according to the present 
invention. 
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FIG. 4 illustrates a flow diagram for the geometric calibration process 
according to the present invention. 

5 FIG. 5 illustrates a flow diagram for the corner detection process 

employed by the system and method according to the present invention. 

FIG. 6 illustrates a flow diagram for the color calibration process employed 
by the system and method according to the present invention. 

10 

FIG. 7 illustrates a flow diagram for estimating visual echoes employed by 
the system and method according to the present invention. 

FIG. 8 illustrates a flow diagram for canceling visual echoes employed by 
15 the system and method according to the present invention. 

FIG. 9 illustrates experimental results obtain by using the visual echo 
cancellation system and method according to the present invention. 

20 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

In the following description of the preferred embodiments of the present 
invention, reference is made to the accompanying drawings, which form a part 
25 hereof, and in which is shown by way of illustration specific embodiments in 
which the invention may be practiced. It is understood that other embodiments 
may be utilized and structural changes may be made without departing from the 
scope of the present invention. 

30 1.0 Exemplary Op rating Environm nt: 
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Figure 1 illustrates an example of a suitable computing system 
environment 100 on which the invention may be implemented. The computing 
system environment 100 is only one example of a suitable computing 
environment and is not intended to suggest any limitation as to the scope of use 
5 or functionality of the invention. Neither should the computing environment 1 00 
be interpreted as having any dependency or requirement relating to any one or 
combination of components illustrated in the exemplary operating environment 
100. 

10 The invention is operational with numerous other general purpose or 

special purpose computing system environments or configurations. Examples of 
well known computing systems, environments, and/or configurations that may be 
suitable for use with the invention include, but are not limited to, personal 
computers, server computers, hand-held, laptop or mobile computer or 

is communications devices such as cell phones and PDA's, multiprocessor 
systems, microprocessor-based systems, set top boxes, programmable 
consumer electronics, network PCs, minicomputers, mainframe computers, 
distributed computing environments that include any of the above systems or 
devices, and the like. 

20 

The invention may be described in the general context of computer- 
executable instructions, such as program modules, being executed by a 
computer. Generally, program modules include routines, programs, objects, 
components, data structures, etc. that perform particular tasks or implement 

25 particular abstract data types. The invention may also be practiced in distributed 
computing environments where tasks are performed by remote processing 
devices that are linked through a communications network. In a distributed 
computing environment, program modules may be located in both local and 
remote computer storage media including memory storage devices. With 

30 reference to Figure 1 , an exemplary system for implementing the invention 
includes a general-purpose computing device in the form of a computer 110. 
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Components of computer 1 10 may include, but are not limited to, a 
processing unit 120, a system memory 130, and a system bus 121 that couples 
various system components including the system memory to the processing unit 

5 120. The system bus 121 may be any of several types of bus structures 

including a memory bus or memory controller, a peripheral bus, and a local bus 
using any of a variety of bus architectures. By way of example, and not 
limitation, such architectures include Industry Standard Architecture (ISA) bus, 
Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video 

10 Electronics Standards Association (VESA) local bus, and Peripheral Component 
Interconnect (PCI) bus also known as Mezzanine bus. 

Computer 110 typically includes a variety of computer readable media. 
Computer readable media can be any available media that can be accessed by 

15 computer 110 and includes both volatile and nonvolatile media, removable and 
non-removable media. By way of example, and not limitation, computer readable 
media may comprise computer storage media and communication media. 
Computer storage media includes volatile and nonvolatile removable and non- 
removable media implemented in any method or technology for storage of 

20 information such as computer readable instructions, data structures, program 
modules or other data. Computer storage media includes, but is not limited to, 
RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, 
digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, 
magnetic tape, magnetic disk storage or other magnetic storage devices, or any 

25 other medium which can be used to store the desired information and which can 
be accessed by computer 110. Communication media typically embodies 
computer readable instructions, data structures, program modules or other data 
in a modulated data signal such as a carrier wave or other transport mechanism 
and includes any information delivery media. The term "modulated data signal" 

30 means a signal that has one or more of its characteristics set or changed in such 
a manner as to encode information in the signal. By way of example, and not 
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limitation, communication media includes wired media such as a wired network 
or direct-wired connection, and wireless media such as acoustic, RF, infrared 
and other wireless media. Combinations of any of the above should also be 
included within the scope of computer readable media. 

5 

The system memory 130 includes computer storage media in the form of 
volatile and/or nonvolatile memory such as read only memory (ROM) 131 and 
random access memory (RAM) 132. A basic input/output system 133 (BIOS), 
containing the basic routines that help to transfer information between elements 
10 within computer 110, such as during start-up, is typically stored in ROM 131 . 
RAM 132 typically contains data and/or program modules that are immediately 
accessible to and/or presently being operated on by processing unit 120. By way 
of example, and not limitation, Figure 1 illustrates operating system 134, 
application programs 135, other program modules 136, and program data 137. 

15 

The computer 110 may also include other removable/non-removable, 
volatile/nonvolatile computer storage media. By way of example only, Figure 1 
illustrates a hard disk drive 141 that reads from or writes to non-removable, 
nonvolatile magnetic media, a magnetic disk drive 151 that reads from or writes 

20 to a removable, nonvolatile magnetic disk 152, and an optical disk drive 1 55 that 
reads from or writes to a removable, nonvolatile optical disk 156 such as a CD 
ROM or other optical media. Other removable/non-removable, 
volatile/nonvolatile computer storage media that can be used in the exemplary 
operating environment include, but are not limited to, magnetic tape cassettes, 

25 flash memory cards, digital versatile disks, digital video tape, solid state RAM, 
solid state ROM, and the like. The hard disk drive 141 is typically connected to 
the system bus 121 through a non-removable memory interface such as interface 
140, and magnetic disk drive 151 and optical disk drive 155 are typically 
connected to the system bus 121 by a removable memory interface, such as 

30 interface 150. 
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The drives and their associated computer storage media discussed above 
and illustrated in Figure 1, provide storage of computer readable instructions, 
data structures, program modules and other data for the computer 110. In Figure 
1 , for example, hard disk drive 141 is illustrated as storing operating system 144, 
5 application programs 145, other program modules 146, and program data 147. 
Note that these components can either be the same as or different from 
operating system 134, application programs 135, other program modules 136, 
and program data 137. Operating system 144, application programs 145, other 
program modules 146, and program data 147 are given different numbers here to 

10 illustrate that, at a minimum, they are different copies. A user may enter 

commands and information into the computer 1 10 through input devices such as 
a keyboard 162 and pointing device 161, commonly referred to as a mouse, 
trackball or touch pad. Other input devices (not shown) may include a 
microphone, joystick, game pad, satellite dish, scanner, or the like. These and 

is other input devices are often connected to the processing unit 120 through a user 
input interface 160 that is coupled to the system bus 121 , but may be connected 
by other interface and bus structures, such as a parallel port, game port or a 
universal serial bus (USB). A monitor 191 or other type of display device is also 
connected to the system bus 121 via an interface, such as a video interface 190. 

20 In addition to the monitor, computers may also include other peripheral output 
devices such as speakers 197 and printer 196, which may be connected through 
an output peripheral interface 195. 

Further, the computer 110 may also include, as an input device, a camera 
25 192 (such as a digital/electronic still or video camera, or film/photographic 

scanner) capable of capturing a sequence of images 193. Further, while just one 
camera 192 is depicted, multiple cameras could be included as input devices to 
the computer 110. The use of multiple cameras provides the capability to 
capture multiple views of an image simultaneously or sequentially, to capture 
30 three-dimensional or depth images, or to capture panoramic images of a scene. 
The images 193 from the one or more cameras 192 are input into the computer 
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1 10 via an appropriate camera interface 194. This interface is connected to the 
system bus 121 , thereby allowing the images 193 to be routed to and stored in 
the RAM 132, or any of the other aforementioned data storage devices 
associated with the computer 110. However, it is noted that image data can be 
input into the computer 1 10 from any of the aforementioned computer-readable 
media as well, without requiring the use of a camera 192. 

The computer 1 10 may operate in a networked environment using logical 
connections to one or more remote computers, such as a remote computer 180. 
The remote computer 180 may be a personal computer, a server, a router, a 
network PC, a peer device or other common network node, and typically includes 
many or all of the elements described above relative to the computer 110, 
although only a memory storage device 181 has been illustrated in Figure 1. The 
logical connections depicted in Figure 1 include a local area network (LAN) 171 
and a wide area network (WAN) 173, but may also include other networks. Such 
networking environments are commonplace in offices, enterprise-wide computer 
networks, intranets and the Internet. 

When used in a LAN networking environment, the computer 1 10 is 
connected to the LAN 171 through a network interface or adapter 170. When 
used in a WAN networking environment, the computer 110 typically includes a 
modem 172 or other means for establishing communications over the WAN 173, 
such as the Internet. The modem 172, which may be internal or external, may be 
connected to the system bus 121 via the user input interface 160, or other 
appropriate mechanism. In a networked environment, program modules 
depicted relative to the computer 1 10, or portions thereof, may be stored in the 
remote memory storage device. By way of example, and not limitation, Figure 1 
illustrates remote application programs 185 as residing on memory device 181. 
It will be appreciated that the network connections shown are exemplary and 
other means of establishing a communications link between the computers may 
be used. 
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The exemplary operating environment having now been discussed, the 
remaining part of this description will be devoted to a discussion of the program 
modules and processes embodying the present invention. 



2.0 A SYSTEM AND METHOD FOR VISUAL ECHO CANCELLATION IN A 
PROJECTOR-CAMERA-WHITEBOARD SYSTEM. 

The system and method of the invention integrates a whiteboard into a 
10 projector-camera system by using the whiteboard both as a writing surface and a 
projecting surface. The invention captures video or a sequence of images of a 
whiteboard and separates writing on the whiteboard from the projected image. It 
is very beneficial to be able to separate whiteboard writings from the projected 
contents for a variety of reasons. For example, it dramatically reduces the 
15 bandwidth requirement for teleconferencing, because both extracted writing and 
the computer-projected contents can be transmitted with very low bandwidth, 
compared to the original mixed video which is affected by shadows and lighting 
variations. Additionally, extracted writings are essential for archiving and 
browsing meetings offline. Writing on the whiteboard usually indicates an 
20 important event in a meeting. By feeding the results to an optical character 

recognition (OCR) system, the meeting archive can be more easily accessed and 
transferred into other forms. 

2.1 System Overview. 

25 

FIG. 2 illustrates one embodiment of the projector-camera-whiteboard 
system 200 of the present invention. A local meeting room 202 is equipped with 
a projector 204, a camera 206, and a whiteboard 208. The projector 204 and the 
camera 206 are preferably rigidly attached to each other, although theoretically 
30 they can be positioned anywhere as long as the projector projects on the 

whiteboard 208 and the camera sees the whole projection area (referred to as 
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the projector space). The projector 204 and the camera 206 are linked 
(wirelessly or not) to a computer 210, and the computer is connected to the 
communication network (e.g., intranet or internet) 212. Remote attendees 214 
also connect their computers to the communication network 212. Data 
5 transmission between the computer in the meeting room and remote computers 
is preferably conducted through real-time transport protocol (RTP). 

An electronic presentation 216 is projected on the whiteboard 208 via a 
video mixer 218 and the projector 204 and is also sent to the remote participants 

10 214 via the network 212 for display at the remote participants. A presentation 
216 can, for example, be presentation slides, a spreadsheet, a PDF file, and so 
on, anything that would typically presented in a meeting, lecture, or brainstorming 
session. The data stream for the presentation 220 is indicated by "P" in FIG. 2. 
Remote attendees 214 may annotate the presentation, and the annotation 

15 stream 222 is indicated by "A". Both the presentation data stream "P" 220 and 
the annotation data stream "A" 222 are mixed together via the video mixer 218 
before sending them to the projector 204 to project them on the whiteboard 208. 
During the presentation, the presenter or other local attendees may write or draw 
on the whiteboard. The camera 206 captures both the projected content and the 

20 writings. Through geometric and color calibrations, the system predicts the 

appearance of the projected "P" and "A" data streams viewed by the camera, i.e., 
a visual echo. A Visual Echo Cancellation module 224 in the computer 210 
extracts only the writings on the whiteboard 226, indicated by "W", by subtracting 
the predicted visual echo from the live video/images captured by the camera. At 

25 the remote side, the presentation data stream "P" 220 and the whiteboard writing 
stream "W" 226 are mixed via a mixer 228 before displaying on the remote 
participant's computer display 230. 



A typical process by which the system and method according to the 
30 invention could be employed is shown in FIG. 3. As shown in FIG. 3, process 
action 302, geometric calibration data is obtained. Similarly, color calibration 
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data is obtained, as shown in process action 304. Images or video of the 
whiteboard having projected items thereon, such as for example, a projected 
presentation and/or annotations made by remote participants, as well as writings 
written on the physical whiteboard are captured, as shown in process action 306. 

5 Then, as shown in process action 308, a visual echo of a captured image is 

computed. Visual echo cancellation is then used to isolate the writings written on 
the whiteboard from the projected content (e.g., the remote participant's 
annotations and the electronic presentation) as shown in process action 310. 
The writings may be sent to the remote participants to be displayed at a remote 

10 display in conjunction with the transmitted presentation and the annotations 
made by the remote participants. Likewise, the writings may be archived for 
future viewing (process action 312). 

Various details of the system and process discussed above are provided 
15 in the paragraphs below. 

2.2 Geometric Calibration. 

For visual echo cancellation, one needs to know the relationship between 
20 the position in the camera view and the position in the projector screen. This is 
the task of geometric calibration. Assuming that both camera and projector are 
linear projective and that the whiteboard surface is planar, it can be easily shown 
that the mapping between a point in the camera view and a point in the projector 
screen/whiteboard is a homography, and can be described by a 3X3 matrix H 
25 defined up to a scale factor. 

For geometric calibration, it is assumed that both the camera and the 
projector are linear projective, and a robust, accurate and simple technique is 
implemented by leveraging the fact that the projector can actively project desired 
30 patterns. 
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FIG. 4 shows a simplified flowchart for the geometric calibration process 
employed by the system and method according to the invention. In one working 
embodiment the whole geometric calibration process takes less than two minutes 

5 and is only necessary when camera is moved with respect to the projector. As 
shown in FIG. 4, process action 402, initially N rectangles are sequentially 
projected onto the whiteboard and their images are simultaneously captured 
using a fixed camera. In one working embodiment of the invention A/=40 was 
used. Next, as shown in process action 404, the four corners of each of the 

10 rectangles are detected in the images, as discussed in the paragraph below. 
Then, the 4xA/ detected corners and their corresponding known positions in the 
projector space are used to estimate the homography between the projector 
screen/whiteboard and the image plane of the camera (process action 406). The 
projector space can be viewed as the virtual plane of the computer display 

15 screen. Since one can control where to display the rectangular on the computer 
screen, the positions of the corners in the projector space are known. It should 
be noted that in theory, only four points (i.e., one rectangle) are necessary to 
estimate the homography. In order to achieve higher accuracy, the system and 
method of the invention uses a greater number of rectangles that are projected at 

20 different locations of the whiteboard. Compared to other geometric calibration 
methods, the system and method according to the invention takes advantage of 
the fact that the relative position between the camera and the projection surface 
is fixed during the calibration. Therefore, correspondences detected in different 
images can be used for estimating a single homography, which increase the 

25 accuracy and robustness of the system and method without complicating the 
corner detection algorithm. 

2.2.1. Corner Detection for Geometric Calibration. 

30 One process that can be used for corner detection is shown in FIG. 5. 

The corner detection process begins by converting the color images to gray scale 
images, as shown in process 502. In order to reduce the noise in edge map and 
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to increase the contrast, one needs to find the region inside and outside the 
projected rectangle and quantize the grayscale value. Since the inside region is 
bright and homogeneous, it forms peak pi at the higher range of the histogram, 
while the background forms peak pi at the lower range. A coarse-to-fine 

5 histogram-based method is used to find the two peaks and set the higher 

3 1 13 
threshold hi = - x pi + - x p2 and the lower threshold hi = - x pi + - x p2 . The 

4 4 4 4 

grayscale level of all pixels above hi are set toM , while those below hi are set 
to hi , and those in between remain unchanged. The edges of the rectangles 
are then detected in the grayscale image using one of the conventional methods 
10 of edge detection to create an edge map(process action 504). A Hough 

transform is then used to detect straight lines on the edge map (process action 
506). A quadrangle is then fit using the lines, as shown in process action 508. 
The corners of the quadrangle are then found (process action 510). 

15 

2.4 Color Calibration, 

For visual echo cancellation, for a given pixel in the projector space on the 
20 whiteboard, one knows its corresponding position in the camera space through 
the geometric calibration described above. However, one also needs to know 
what the corresponding color in projector space should look like in the captured 
images/video, which is determined by color calibration. The same color in the 
projector space appears differently in the camera, depending where the color is 
25 projected on the whiteboard. This is because the projector light bulb does not 
produce uniform lighting, the lighting in the room is flickering and not uniform, 
and the whiteboard surface is not Lambertian. Therefore, color calibration should 
be both color- and position-dependent. 

30 For color calibration, pixels of the visual echo are modeled as independent 

Gaussian random variables and a lookup-table-based approach is used. Note 
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that both geometric calibration and color calibration should also be useful for 
other tasks than visual echo cancellation, such as automatic keystone correction 
for projectors. 

5 FIG. 6 shows the flowchart for color calibration. As shown in FIG. 6, 

process action 602, this process begins by quantizing the RGB color space into 
bins. In one working embodiment of the invention 9 x 9 x 9 or 729 bins were 
used. In process action 604, each quantized color is projected over the whole 
display region and its image is captured in synchronization, storing n frames for 

10 each color (process action 606). In one working embodiment 5 frames were 
stored for each color. In process action 608, each image is warped to the 
projected screen coordinates using the homography H found in the geometric 
calibration. The display region is divided evenly into rectangular blocks (e.g., 
32x32 or 1024) and the mean and variance of each color in each block is 

15 calculated across the n frames, as shown in process 610. The mean and 

variance values are entered into a lookup table for color C (process action 612) 
and the next color is projected until all quantized colors have been projected 
(process action 614 and 616). 

20 Using the process of FIG. 6, a lookup table is built for the quantized colors 

(e.g. 729) at each of the blocks (e.g. 1024 blocks). Note that the spatial 
dimensionality is necessary because the same projected color will have a 
different appearance at different positions on the screen/whiteboard. The best 
result would be obtained if a lookup table were built for every pixel position, but 

25 this seems unnecessary based on experimental data because the color 
appearance changes smoothly across the display surface. 

Usually it is sufficient to perform the color calibration procedure once. 
However, it may be necessary to perform the color calibration procedure again if 
30 certain settings for the projector (color temperature, contrast or brightness) or the 
camera (exposure or white balance) are changed. Projecting and capturing 729 
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x n (=3645 when n = 5 ) frames at 10 fps (to ensure projecting and capturing are 
synchronized) takes about 6 minutes in one embodiment of the invention. 



2.5. Visual Echo Cancellation. 

5 

The following paragraphs describe the visual echo estimation and 
cancellation procedures. 

2.5.1 Visual Echo Estimation. 

10 

Given arbitrary display content of a captured image (process action 702), 
the system and method of the invention estimates the visual echoE , as shown in 
FIG. 7, by initially substituting each pixel with its correspondent mean color in the 
lookup table defined in the color calibration procedure (process action 704). For 
15 colors not in the table, linear interpolation of the two nearest bins is used. Then, 
as shown in process action 706, each pixel in the image is warped to the camera 
view. To obtain an estimate of the error bound for each pixel, one also looks up 
and warps the variances to get a pixel-wise variance map V (process action 708) 

20 2.5. Visual Echo Cancellation 

FIG. 8 shows a flowchart of the general visual echo cancellation process 
according to the present invention. The details are explained in the following 
subsections. As shown in FIG. 8, process action 802, an image of the 

25 whiteboard containing the projected content such as a presentation and any 
annotations, as well as possible whiteboard writings for which the visual echo 
was calculated is input into the system. The projected content (e.g., projected 
presentation and annotations) of the image is compared with the corrected image 
(visual echo) using the ratio of the albedo of the captured image to the albedo of 

30 the visual echo, as shown in process action 804. Using the albedo ratio, the 
writings on the whiteboard (vice the projected content) are identified and their 
color is recovered, as shown in process action 806. 
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2.5.1. Generativ Process of The Captured Imaq . 

By writing/drawing with a marker on a whiteboard, one actually changes 
the surface albedo of the whiteboard, and therefore changes the reflection. 
5 Therefore, in a general sense, extracting the writings on the whiteboard boils 
down to detecting the changes of the surface albedo. 

Assuming all the images are geometrically aligned, and denoting the 
incident light map by P , the surface albedo of the whiteboard by A , the pixel- 

10 wise color transformation due the camera sensor by C , and the visual echo by 
E, one has E=CxAxP. If nothing is written on the whiteboard, then the captured 
image / should be equal to£ . If there is anything written on the whiteboard, the 
surface albedo changes, and is denoted by .4 . The captured image can then be 
described by 7=Cx A xP. One can compute the albedo change by estimating the 

15 albedo ratio a = AIA of the pixel [x,y] in color channel ce {R,G,B}, which is given 
by 

Note that writings on the whiteboard absorb the light, so A < A , and in 
consequence a [xy]c < 1 . 



20 



25 



Based on the albedo ratio a , one can detect the writings and recover their 
color. The albedo for the whiteboard region without writings should be 1 . 
Assuming the sensor noise on the albedo is additive and has a zero-mean 
Gaussian distribution with variance £ , the following decision rule results: 

Pixel [x 9 y] belongs to the written region if and only if 

j _ a [x t y],R+ a [x,ylG + fl [^],g > V [x>y),R + V [x,y],G + V [x,y),B ^) 
3 E [x,ylR + E [x,y],G + E [x>y]f 
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Note that the decision rule is one-sided, because, as mentioned earlier, the 
albedo ratio for written whiteboard region is strictly less than 1 . 

For each pixel [x,y] that belongs to the written region, one can recover 
5 the writings with its colors as 

W [^]c=^]c x 255 (3) 
assuming the color intensity ranges from 0 to 255. 

2.5.2. Practical Considerations. 

10 

Due to the noise in geometric calibration, / and E are not exactly 
aligned. The 1 to 2 pixel errors are most evident near strong edges in E . 
Therefore, in written region segmentation, one first applies an erosion on E 
which increases the dark region. Erosion is a standard image morphological 

15 operation. In a binary image, it will assign a 0 (black) value to pixels with a 

certain neighborhood pattern. It is extended to grayscale and color images. The 
system and method of the invention use its extension in a color image. Thus, 
the pixels near the dark regions in E have higher A and are less likely be 
classified as written region. This preprocessing reduces error because in order to 

20 make the writings more visible, most users prefer to write on top of brighter 
background instead of darker background. 

In practice, to make the colors in W visible, one needs to set the camera 
exposure to be much higher than normality. This will cause over-exposure 
25 during color calibration. The system and method of the invention addresses this 
problem by setting the exposure optimal for color calibration, and using a 
classification method to recover the colors of the writings. The four most 
commonly used markers (red, black, blue and green) are chosen as classes 
M 0 ~ M y . For supervised training, one uses Equations (2) and (3) to recover a 

30 set of writings W , and then converts it from RGB color space to HSI (hue, 
saturation and intensity) color space, and denotes the new image as W\ The 
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training data for class M. is labeled by manually by selecting the region of 
written by Marker i , and collecting its histogram n,(h,s,i) . 

To classify a pixel W [xy] obtained from Equation (3), its RGB value is 
5 converted to W Uy] in HSI space and evaluate the likelihood that it belongs 
Cluster i as 

P([*,y] I = ni(W '™" '*J*» ' W '™ ) , fo ri =0,...3, (4) 

where AT. is the total number of data points in Histogram / . 

10 Due to noise in camera sensor, a MAP decision rule may not give spatially 

consistent results, so a 61x61 window is used to the collect votes from all the 
pixels in the neighborhood and to classify the center pixel based on the 
maximum votes. 

is 3.0 Experimental Results. 

The geometric calibration method was used using various projectors 
(including an InFocus LP530 and a Proxima DP6155) and various video cameras 
20 (including a Aplex USB2, a Logitech Pro4000 and a SONY EVI30), under both 
artificial lighting and natural lighting conditions. The fitting error for solving the 
homography based on correspondences ranges from 0.3 to 0.7 pixels. 

For color calibration, a SONY projector and EVI30 camera were used. 
25 Comparing the estimated visual echo E with the actual captured image / , the 
average error is around 3 (color intensity range 0~255). The majority of the 
discrepancy is around the regions with strong edges, due to the noise in 
geometric calibration. 
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FIG. 9 shows the visual echo cancellation results on various backgrounds. 
One can see that majority of the writings are recovered except for the parts on 
top of extremely complex backgrounds. In this case, however, it is even difficult 
for the human eye to discern the writings. 

5 

The foregoing description of the invention has been presented for the 
purposes of illustration and description. It is not intended to be exhaustive or to 
limit the invention to the precise form disclosed. Many modifications and 
10 variations are possible in light of the above teaching. It is intended that the 

scope of the invention be limited not by this detailed description, but rather by the 
claims appended hereto. 



-21- 



